14 research outputs found
Main Memory Adaptive Indexing for Multi-core Systems
Adaptive indexing is a concept that considers index creation in databases as
a by-product of query processing; as opposed to traditional full index creation
where the indexing effort is performed up front before answering any queries.
Adaptive indexing has received a considerable amount of attention, and several
algorithms have been proposed over the past few years; including a recent
experimental study comparing a large number of existing methods. Until now,
however, most adaptive indexing algorithms have been designed single-threaded,
yet with multi-core systems already well established, the idea of designing
parallel algorithms for adaptive indexing is very natural. In this regard only
one parallel algorithm for adaptive indexing has recently appeared in the
literature: The parallel version of standard cracking. In this paper we
describe three alternative parallel algorithms for adaptive indexing, including
a second variant of a parallel standard cracking algorithm. Additionally, we
describe a hybrid parallel sorting algorithm, and a NUMA-aware method based on
sorting. We then thoroughly compare all these algorithms experimentally; along
a variant of a recently published parallel version of radix sort. Parallel
sorting algorithms serve as a realistic baseline for multi-threaded adaptive
indexing techniques. In total we experimentally compare seven parallel
algorithms. Additionally, we extensively profile all considered algorithms. The
initial set of experiments considered in this paper indicates that our parallel
algorithms significantly improve over previously known ones. Our results
suggest that, although adaptive indexing algorithms are a good design choice in
single-threaded environments, the rules change considerably in the parallel
case. That is, in future highly-parallel environments, sorting algorithms could
be serious alternatives to adaptive indexing.Comment: 26 pages, 7 figure
Über den Kreisschluss von algorithmischer und systembezogener Datenbankoptimierung : eine umfassende Untersuchung von adaptiver Indizierung, Datenpartitionierung und der Neuverdrahtung von virtuellem Speicher
Over the decades, with the increase of computing resources, the amount of data to manage also increased tremendously. Besides of the sheer quantity of information, the quality of it highly varies today.
Indexing all this data with equal effort is cumbersome and wasteful. Thus, adaptive indexing algorithms refine parts of interest more carefully. Unfortunately, the adaptivity also introduces a set of new problems. High variance in response times and low robustness against certain workloads are just two issues to mention. A vast amount of methods have been proposed to deal with these problems. Thus, in the first part of this thesis, we will reinvestigate, analyze, and enhance the class of adaptive indexing methods in a comprehensive evaluation on the algorithmic level. In total, we discuss 18 cracking methods, 6 sorting algorithms, and 3 full index structures, including our own proposed methods.
Consequently, we identify data partitioning as the common component. Thus, in the second part, we analyze the surprising amount of optimizations possible to enhance partitioning. Interestingly, they mostly originate from a more sophisticated mapping of the method to the system properties, thus shifting our perspective to a system-centric view.
Subsequently, in the third part, we dig down to the ground level by exploiting a core feature of any modern operating system, the virtual memory system. We investigate how virtual and physical memory can be separated in user space and how the mappings between the two memory types can be rewired freely at run-time. Using rewiring, we are able to significantly enhance core applications of data management systems.
Finally, we apply the techniques identified in this thesis to the initial adaptive indexing algorithm to significantly improve it — and close the circle.Im Laufe der Jahrzehnte kam es neben dem Anstieg der zur Verfüung stehenden Berechnungsressourcen auch zu einem heftigen Anstieg der zu verwaltenden Datenmengen. Abgesehen von der schieren Größe der Informationsmenge variiert heutzutage auch die Qualität dieser in großem Maße.
Die Datenmenge mit gleichverteiltem Aufwand zu indizieren ist ein mühseliges und verschwenderisches Unterfangen. Daher investiert die Klasse der adaptiven Indizes mehr Indizierungsaufwand auf Bereiche von Interesse. Leider bringt adaptive Indizierung auch einige neue Probleme mit sich, die es zu handhaben gilt. Große Varianz in der Laufzeit und schwache Robustheit gegenüber gewissen Anfragemustern sind nur zwei der zu benennenden Schwierigkeiten. Um diesen Problemen entgegenzuwirken kam es in der Vergangenheit zur Entwicklung einer großen Anzahl verschiedenster Algorithmen. Im ersten Teil dieser Dissertation werden wir daher die Klasse der adaptiven Indizes in einer allumfassenden Evaluierung auf algorithmischer Ebene neu untersuchen, analysieren und erweitern. Insgesamt behandelt diese Auswertung 18 verschiedene Cracking-Methoden, 6 Sortieralgorithmen und 3 traditionelle Indexstrukturen, inklusive unserer eigenen Algorithmen.
Auf Basis dieser Untersuchungen identifizieren wir Datenpartitionierung als gemeinsame Komponente. Daher analysieren wir im zweiten Teil dieser Dissertati- on die überraschend große Anzahl an möglichen Optimierungen zur Verbesserung der Partitionierungsphase. Interessanterweise entspringen diese hauptsächlich einer ausgeklügelteren Abbildung des Problems auf die Gegebenheiten des zu Grunde liegenden Systems. Daher verlagern wir unsere Perspektive auf eine system-zentrische Ebene.
Darauffolgend gelangen wir im dritten Teil dieser Dissertation auf die unterste Ebene der Betrachtung, indem wir uns der Ausnutzung eines elementaren Bestandteiles eines jeden modernen Betriebssystems widmen, der virtuellen Speicherverwaltung. Wir untersuchen wie virtueller und physischer Speicher in der Benutzerumgebung voneinander getrennt werden können und wie die Abbildungen zwischen den beiden Speichertypen zur Laufzeit manipuliert und neu verdrahtet werden können. Mit Hilfe dieser Technik sind wir in der Lage, Kernapplikationen von Datenbanksystemen nachhaltig zu verbessern.
Abschließend wenden wir die im Laufe dieser Arbeit identifizierten Methoden auf den initialen adaptiven Indizierungsalgorithmus an und verbessern diesen signifikant — um den Kreis zu schließen
An experimental evaluation and analysis of database cracking
Database cracking has been an area of active research in recent years. The core idea of database cracking is to create indexes adaptively and incrementally as a side product of query processing. Several works have proposed different cracking techniques for different aspects including updates, tuple reconstruction, convergence, concurrency control, and robustness. Our 2014 VLDB paper “The Uncracked Pieces in Database Cracking” (PVLDB 7:97–108, 2013/VLDB 2014) was the first comparative study of these different methods by an independent group. In this article, we extend our published experimental study on database cracking and bring it to an up-to-date state. Our goal is to critically review several aspects, identify the potential, and propose promising directions in database cracking. With this study, we hope to expand the scope of database cracking and possibly leverage cracking in database engines other than MonetDB. We repeat several prior database cracking works including the core cracking algorithms as well as three other works on convergence (hybrid cracking), tuple reconstruction (sideways cracking), and robustness (stochastic cracking), respectively. Additionally to our conference paper, we now also look at a recently published study about CPU efficiency (predication cracking). We evaluate these works and show possible directions to do even better. As a further extension, we evaluate the whole class of parallel cracking algorithms that were proposed in three recent works. Altogether, in this work we revisit 8 papers on database cracking and evaluate in total 18 cracking methods, 6 sorting algorithms, and 3 full index structures. Additionally, we test cracking under a variety of experimental settings, including high selectivity (Low selectivity means that many entries qualify. Consequently, a high selectivity means, that only few entries qualify) queries, low selectivity queries, varying selectivity, and multiple query access patterns. Finally, we compare cracking against different sorting algorithms as well as against different main memory optimized indexes, including the recently proposed adaptive radix tree (ART). Our results show that: (1) the previously proposed cracking algorithms are repeatable, (2) there is still enough room to significantly improve the previously proposed cracking algorithms, (3) parallelizing cracking algorithms efficiently is a hard task, (4) cracking depends heavily on query selectivity, (5) cracking needs to catch up with modern indexing trends, and (6) different indexing algorithms have different indexing signatures.Germany. Federal Ministry of Education and Researc
The Case for Automatic Database Administration using Deep Reinforcement Learning
Like any large software system, a full-fledged DBMS offers an overwhelming
amount of configuration knobs. These range from static initialisation
parameters like buffer sizes, degree of concurrency, or level of replication to
complex runtime decisions like creating a secondary index on a particular
column or reorganising the physical layout of the store. To simplify the
configuration, industry grade DBMSs are usually shipped with various advisory
tools, that provide recommendations for given workloads and machines. However,
reality shows that the actual configuration, tuning, and maintenance is usually
still done by a human administrator, relying on intuition and experience.
Recent work on deep reinforcement learning has shown very promising results in
solving problems, that require such a sense of intuition. For instance, it has
been applied very successfully in learning how to play complicated games with
enormous search spaces. Motivated by these achievements, in this work we
explore how deep reinforcement learning can be used to administer a DBMS.
First, we will describe how deep reinforcement learning can be used to
automatically tune an arbitrary software system like a DBMS by defining a
problem environment. Second, we showcase our concept of NoDBA at the concrete
example of index selection and evaluate how well it recommends indexes for
given workloads